word2vec Skip-Gram with Negative Sampling is a Weighted Logistic PCA

نویسندگان

  • Andrew J. Landgraf
  • Jeremy Bellay
چکیده

Mikolov et al. (2013) introduced the skip-gram formulation for neural word embeddings, wherein one tries to predict the context of a given word. Their negative-sampling algorithm improved the computational feasibility of training the embeddings. Due to their state-of-the-art performance on a number of tasks, there has been much research aimed at better understanding it. Goldberg and Levy (2014) showed that skip-gram with negative-sampling algorithm (SGNS) maximizes a different likelihood than the skip-gram formulation poses and further showed how it is implicitly related to pointwise mutual information (Levy and Goldberg, 2014). We show that SGNS is a weighted logistic PCA, which is a special case of exponential family PCA for the binomial likelihood. Cotterell et al. (2017) showed that the skip-gram formulation can be viewed as exponential family PCA with a multinomial likelihood, but they did not make the connection between the negative-sampling algorithm and the binomial likelihood. Li et al. (2015) showed that SGNS is an explicit matrix factorization related to representation learning, but the matrix factorization objective they found was complicated and they did not find the connection to the binomial distribution or exponential family PCA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Riemannian Optimization for Skip-Gram Negative Sampling

Skip-Gram Negative Sampling (SGNS) word embedding model, well known by its implementation in “word2vec” software, is usually optimized by stochastic gradient descent. However, the optimization of SGNS objective can be viewed as a problem of searching for a good matrix with the low-rank constraint. The most standard way to solve this type of problems is to apply Riemannian optimization framework...

متن کامل

Linking GloVe with word2vec

The Global Vectors for word representation (GloVe), introduced by Jeffrey Pennington et al. [3] is reported to be an efficient and effective method for learning vector representations of words. State-of-the-art performance is also provided by skip-gram with negative-sampling (SGNS) [2] implemented in the word2vec tool. In this note, we explain the similarities between the training objectives of...

متن کامل

Information-Theory Interpretation of the Skip-Gram Negative-Sampling Objective Function

In this paper, we define a measure of dependency between two random variables, based on the Jensen-Shannon (JS) divergence between their joint distribution and the product of their marginal distributions. Then, we show that word2vec’s skip-gram with negative sampling embedding algorithm finds the optimal low-dimensional approximation of this JS dependency measure between the words and their con...

متن کامل

Modeling Musical Context with Word2vec

We present a semantic vector space model for capturing complex polyphonic musical context. A word2vec model based on a skip-gram representation with negative sampling was used to model slices of music from a dataset of Beethoven’s piano sonatas. A visualization of the reduced vector space using t-distributed stochastic neighbor embedding shows that the resulting embedded vector space captures t...

متن کامل

Streaming Word Embeddings with the Space-Saving Algorithm

We develop a streaming (one-pass, boundedmemory) word embedding algorithm based on the canonical skip-gram with negative sampling algorithm implemented in word2vec. We compare our streaming algorithm to word2vec empirically by measuring the cosine similarity between word pairs under each algorithm and by applying each algorithm in the downstream task of hashtag prediction on a two-month interva...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1705.09755  شماره 

صفحات  -

تاریخ انتشار 2017